Loading the data
So, lets load some data. Since it is the topic of this lecture series, why not do a bibliographic mapping of “Innovation system” and “innovation ecosystem”" literature. Here I use the web of science database on scientific literature. I here downloaded the following query.
- Data source: Clarivate Analytics Web of Science (http://apps.webofknowledge.com)
- Data format: bibtex
- Query: TOPIC: (“innovation system” OR “systems of innovation” OR “innovation ecosystem”)
- Timespan: the beginning of time - March 2019
- Document Type: Articles
- Language: English
- Query data: March, 2019
- Selection: 1000 most cited
We now just read the plain data with the inbuild convert2df() function
M <- readFiles("../input/wos_1.bib", "../input/wos_2.bib") %>%
convert2df(dbsource = "isi",
format = "bibtex")
##
## Converting your isi collection into a bibliographic dataframe
##
## Articles extracted 100
## Articles extracted 200
## Articles extracted 300
## Articles extracted 400
## Articles extracted 500
## Articles extracted 600
## Articles extracted 700
## Articles extracted 800
## Articles extracted 900
## Articles extracted 1000
## Done!
##
##
## Generating affiliation field tag AU_UN from C1: Done!
M %>% head()
Descriptive Analysis
Although bibliometrics is mainly known for quantifying the scientific production and measuring its quality and impact, it is also useful for displaying and analysing the intellectual, conceptual and social structures of research as well as their evolution and dynamical aspects.
In this way, bibliometrics aims to describe how specific disciplines, scientific domains, or research fields are structured and how they evolve over time. In other words, bibliometric methods help to map the science (so-called science mapping) and are very useful in the case of research synthesis, especially for the systematic ones.
Bibliometrics is an academic science founded on a set of statistical methods, which can be used to analyze scientific big data quantitatively and their evolution over time and discover information. Network structure is often used to model the interaction among authors, papers/documents/articles, references, keywords, etc.
Bibliometrix is an open-source software for automating the stages of data-analysis and data-visualization. After converting and uploading bibliographic data in R, Bibliometrix performs a descriptive analysis and different research-structure analysis.
Descriptive analysis provides some snapshots about the annual research development, the top “k” productive authors, papers, countries and most relevant keywords.
Main findings about the collection
results <- biblioAnalysis(M)
summary(results,
k = 20,
pause = F)
##
##
## Main Information about data
##
## Documents 1000
## Sources (Journals, Books, etc.) 288
## Keywords Plus (ID) 870
## Author's Keywords (DE) 1172
## Period 1975 - 2018
## Average citations per documents 71.11
##
## Authors 1830
## Author Appearances 2397
## Authors of single-authored documents 215
## Authors of multi-authored documents 1615
## Single-authored documents 251
##
## Documents per Author 0.546
## Authors per Document 1.83
## Co-Authors per Documents 2.4
## Collaboration Index 2.16
##
## Document types
## ARTICLE 411
## ARTICLE, PROCEEDINGS PAPER 47
## EDITORIAL MATERIAL 5
## PROCEEDINGS PAPER 2
## REVIEW 34
## REVIEW, BOOK CHAPTER 1
##
##
## Annual Scientific Production
##
## Year Articles
## 1975 1
## 1990 1
## 1991 1
## 1992 4
## 1993 2
## 1994 3
## 1995 7
## 1996 3
## 1997 5
## 1998 15
## 1999 12
## 2000 19
## 2001 26
## 2002 34
## 2003 32
## 2004 31
## 2005 35
## 2006 31
## 2007 46
## 2008 55
## 2009 74
## 2010 60
## 2011 90
## 2012 71
## 2013 75
## 2014 73
## 2015 84
## 2016 62
## 2017 33
## 2018 15
##
## Annual Percentage Growth Rate 9.787999
##
##
## Most Productive Authors
##
## Authors Articles Authors Articles Fractionalized
## 1 HEKKERT MP 25 HEKKERT MP 8.62
## 2 KLERKX L 14 COOKE P 5.00
## 3 COENEN L 11 LEYDESDORFF L 5.00
## 4 TRUFFER B 11 KLERKX L 4.55
## 5 LEYDESDORFF L 10 MOWERY DC 4.50
## 6 JACOBSSON S 9 JACOBSSON S 4.20
## 7 NEGRO SO 9 COENEN L 4.08
## 8 COOKE P 8 WONGLIMPIYARAT J 4.00
## 9 LEEUWIS C 8 TRUFFER B 3.82
## 10 MARKARD J 8 CHEN SH 3.50
## 11 DOLOREUX D 7 FREEMAN C 3.50
## 12 ARCHIBUGI D 6 FRITSCH M 3.50
## 13 GUAN J 6 HUNG SC 3.50
## 14 HARMAAKORPI V 6 DOLOREUX D 3.33
## 15 ISAKSEN A 6 TRIPPL M 3.33
## 16 LEHRER M 6 HARMAAKORPI V 3.17
## 17 TRIPPL M 6 LEHRER M 3.17
## 18 BERGEK A 5 DIEZ JR 3.08
## 19 BINZ C 5 KITAGAWA F 3.00
## 20 DIEZ JR 5 MOTOHASHI K 3.00
##
##
## Top manuscripts per citations
##
## Paper TC TCperYear
## 1 GEELS FW, 2004, RES POLICY 950 63.3
## 2 FREEMAN C, 1995, CAMBR J ECON 869 36.2
## 3 MALERBA F, 2002, RES POLICY 840 49.4
## 4 COOKE P, 1997, RES POLICY 836 38.0
## 5 HEKKERT MP, 2007, TECHNOL FORECAST SOC CHANG 708 59.0
## 6 BERGEK A, 2008, RES POLICY 571 51.9
## 7 ASHEIM BT, 2005, RES POLICY 554 39.6
## 8 PITTAWAY L, 2004, INT J MANAG REV 544 36.3
## 9 LUNDVALL BA, 2002, RES POLICY 488 28.7
## 10 MOULAERT F, 2003, REG STUD 443 27.7
## 11 JACOBSSON S, 2000, ENERGY POLICY 398 20.9
## 12 MEYER-KRAHMER F, 1998, RES POLICY 390 18.6
## 13 MULLER E, 2001, RES POLICY 375 20.8
## 14 COOKE P, 1992, GEOFORUM 343 12.7
## 15 ADNER R, 2006, HARV BUS REV 317 24.4
## 16 BUNNELL TG, 2001, PROG HUM GEOGR 293 16.3
## 17 LIU XL, 2001, RES POLICY 292 16.2
## 18 CHRISTENSEN JF, 2005, RES POLICY 257 18.4
## 19 COLOMBO MG, 2002, RES POLICY 246 14.5
## 20 CARAYANNIS EG, 2009, INT J TECHNOL MANAGE 245 24.5
##
##
## Corresponding Author's Countries
##
## Country Articles Freq SCP MCP MCP_Ratio
## 1 UNITED KINGDOM 76 0.1529 41 35 0.461
## 2 NETHERLANDS 54 0.1087 38 16 0.296
## 3 USA 45 0.0905 32 13 0.289
## 4 GERMANY 41 0.0825 30 11 0.268
## 5 SWEDEN 36 0.0724 22 14 0.389
## 6 CHINA 25 0.0503 14 11 0.440
## 7 CANADA 23 0.0463 14 9 0.391
## 8 ITALY 21 0.0423 15 6 0.286
## 9 AUSTRIA 16 0.0322 13 3 0.188
## 10 FINLAND 15 0.0302 12 3 0.200
## 11 JAPAN 14 0.0282 11 3 0.214
## 12 SPAIN 14 0.0282 9 5 0.357
## 13 SWITZERLAND 13 0.0262 6 7 0.538
## 14 DENMARK 12 0.0241 9 3 0.250
## 15 FRANCE 12 0.0241 8 4 0.333
## 16 NORWAY 12 0.0241 9 3 0.250
## 17 TAIWAN 12 0.0241 9 3 0.250
## 18 KOREA 9 0.0181 8 1 0.111
## 19 BELGIUM 6 0.0121 4 2 0.333
## 20 ISRAEL 5 0.0101 3 2 0.400
##
##
## SCP: Single Country Publications
##
## MCP: Multiple Country Publications
##
##
## Total Citations per Country
##
## Country Total Citations Average Article Citations
## 1 UNITED KINGDOM 7033 92.5
## 2 NETHERLANDS 4864 90.1
## 3 SWEDEN 3275 91.0
## 4 USA 2721 60.5
## 5 GERMANY 2712 66.1
## 6 ITALY 2319 110.4
## 7 DENMARK 1442 120.2
## 8 CHINA 1327 53.1
## 9 CANADA 1308 56.9
## 10 FRANCE 1185 98.8
## 11 AUSTRIA 1177 73.6
## 12 SWITZERLAND 663 51.0
## 13 NORWAY 626 52.2
## 14 JAPAN 591 42.2
## 15 FINLAND 549 36.6
## 16 TAIWAN 523 43.6
## 17 SPAIN 467 33.4
## 18 KOREA 409 45.4
## 19 BELGIUM 389 64.8
## 20 SINGAPORE 339 113.0
##
##
## Most Relevant Sources
##
## Sources Articles
## 1 RESEARCH POLICY 125
## 2 TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE 70
## 3 EUROPEAN PLANNING STUDIES 46
## 4 TECHNOVATION 39
## 5 ENERGY POLICY 33
## 6 SCIENTOMETRICS 31
## 7 TECHNOLOGY ANALYSIS \\& STRATEGIC MANAGEMENT 29
## 8 REGIONAL STUDIES 25
## 9 INTERNATIONAL JOURNAL OF TECHNOLOGY MANAGEMENT 22
## 10 JOURNAL OF CLEANER PRODUCTION 20
## 11 SCIENCE AND PUBLIC POLICY 19
## 12 JOURNAL OF TECHNOLOGY TRANSFER 18
## 13 AGRICULTURAL SYSTEMS 16
## 14 RENEWABLE \\& SUSTAINABLE ENERGY REVIEWS 13
## 15 R \\& D MANAGEMENT 12
## 16 ENVIRONMENTAL INNOVATION AND SOCIETAL TRANSITIONS 10
## 17 INDUSTRY AND INNOVATION 10
## 18 INNOVATION-MANAGEMENT POLICY \\& PRACTICE 10
## 19 ENVIRONMENT AND PLANNING C-GOVERNMENT AND POLICY 9
## 20 WORLD DEVELOPMENT 9
##
##
## Most Relevant Keywords
##
## Author Keywords (DE) Articles Keywords-Plus (ID) Articles
## 1 INNOVATION SYSTEM 66 TECHNOLOGY 66
## 2 INNOVATION 54 INNOVATION 65
## 3 REGIONAL INNOVATION SYSTEM 29 KNOWLEDGE 58
## 4 INNOVATION SYSTEMS 26 SYSTEMS 54
## 5 INNOVATION POLICY 24 NETWORKS 53
## 6 CHINA 21 POLICY 51
## 7 TECHNOLOGICAL INNOVATION SYSTEM 21 RESEARCH AND DEVELOPMENT 47
## 8 NATIONAL INNOVATION SYSTEM 16 FIRMS 46
## 9 NATIONAL SYSTEMS OF INNOVATION 14 INDUSTRY 43
## 10 REGIONAL INNOVATION SYSTEMS 13 PERSPECTIVE 43
## 11 NETWORKS 12 SCIENCE 41
## 12 POLICY 11 FRAMEWORK 38
## 13 R D 11 MANAGEMENT 32
## 14 SYSTEMS OF INNOVATION 11 PERFORMANCE 29
## 15 INSTITUTIONS 9 DYNAMICS 28
## 16 NATIONAL INNOVATION SYSTEMS 9 DIFFUSION 27
## 17 DEVELOPMENT 8 GROWTH 27
## 18 INNOVATION NETWORKS 8 RENEWABLE ENERGY TECHNOLOGY 23
## 19 LOCK IN 8 EVOLUTION 21
## 20 OPEN INNOVATION 8 SPILLOVERS 21
plot(results)





Most Cited References (internally)
CR <- citations(M,
field = "article",
sep = ";")
cbind(CR$Cited[1:10])
## [,1]
## LUNDVALL B.-A, 1992, NATL SYSTEMS INNOVAT. 90
## NELSON R, 1993, NATL INNOVATION SYST. 78
## EDQUIST C., 1997, SYSTEMS INNOVATION T. 65
## BERGEK A, 2008, RES POLICY, V37, P407, DOI 10.1016/J.RESPOL.2007.12.003. 62
## FREEMAN C, 1987, TECHNOLOGY POLICY EC. 60
## CARLSSON B, 1991, J EVOLUTIONARY EC, V1, P93, DOI DOI 10.1007/BF01224915. 58
## COHEN WM, 1990, ADMIN SCI QUART, V35, P128, DOI 10.2307/2393553. 53
## HEKKERT MP, 2007, TECHNOL FORECAST SOC, V74, P413, DOI 10.1016/J.TECHFORE.2006.03.002. 50
## NELSON R, 1982, EVOLUTIONARY THEORY. 46
## CARLSSON B, 2002, RES POLICY, V31, P233, DOI 10.1016/S0048-7333(01)00138-X. 42
Bibliographic Copling Analysis: The Knowledge Frontier of the Field
Bibliographic coupling is a newer technique, which has turned out to be very appropriate to capture a fields current knowledge frontier. I will show you how to do it here, but in case you are interested, read my paper :)
NetMatrix <- biblioNetwork(M,
analysis = "coupling",
network = "references",
sep = ";")
net <-networkPlot(NetMatrix,
n = 50,
Title = "Bibliographic Coupling Network",
type = "fruchterman",
size.cex = TRUE,
size = 20,
remove.multiple = FALSE,
labelsize = 0.7,
edgesize = 10,
edges.min = 5)

Co-citation Analysis: The Intellectual Structure and Knowledge Bases of the field
Citation analysis is one of the main classic techniques in bibliometrics. It shows the structure of a specific field through the linkages between nodes (e.g. authors, papers, journal), while the edges can be differently interpretated depending on the network type, that are namely co-citation, direct citation, bibliographic coupling.
Below there are three examples.
- First, a co-citation network that shows relations between cited-reference works (nodes).
- Second, a co-citation network that uses cited-journals as unit of analysis. The useful dimensions to comment the co-citation networks are: (i) centrality and peripherality of nodes, (ii) their proximity and distance, (iii) strength of ties, (iv) clusters, (iiv) bridging contributions.
- Third, a historiograph is built on direct citations. It draws the intellectual linkages in a historical order. Cited works of thousands of authors contained in a collection of published scientific articles is sufficient for recostructing the historiographic structure of the field, calling out the basic works in it.
Co-citation (cited references) analysis
Plot options:
- n = 50 (the funxtion plots the main 50 cited references)
- type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
- size.cex = TRUE (the size of the vertices is proportional to their degree)
- size = 20 (the max size of vertices)
- remove.multiple=FALSE (multiple edges are not removed)
- labelsize = 0.7 (defines the size of vertex labels)
- edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
- edges.min = 5 (plots only edges with a strength greater than or equal to 5)
- all other arguments assume the default values
NetMatrix <- biblioNetwork(M,
analysis = "co-citation",
network = "references",
sep = ";")
net <-networkPlot(NetMatrix,
n = 50,
Title = "Co-Citation Network",
type = "fruchterman",
size.cex = TRUE,
size = 20,
remove.multiple = FALSE,
labelsize = 0.7,
edgesize = 10,
edges.min = 5)

Cited Journal (Source) co-citation analysis
M <- metaTagExtraction(M, "CR_SO", sep=";")
NetMatrix <- biblioNetwork(M,
analysis = "co-citation",
network = "sources",
sep = ";")
net <-networkPlot(NetMatrix,
n = 50,
Title = "Co-Citation Network",
type = "auto",
size.cex = TRUE,
size = 15,
remove.multiple = FALSE,
labelsize = 0.7,
edgesize = 10,
edges.min = 5)

by the way, the results contain an “hidden” igraph obejct. That is new, and makes further analysis of the results possible. Great!
str(net, max.level = 2)
## List of 5
## $ graph :List of 10
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..- attr(*, "class")= chr "igraph"
## $ graph_pajek:List of 10
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..$ :List of 1
## ..- attr(*, "class")= chr "igraph"
## $ cluster_obj:List of 3
## ..$ merges : chr [1:20] "technol anal strateg" "technical change ec" "systems innovation t" "j evolutionary ec" ...
## ..$ modularity: chr [1:19] "am econ rev" "acad manage rev" "j econ lit" "technovation" ...
## ..$ membership: chr [1:11] "regional innovation." "european planning st" "oxford hdb innovatio" "j technology transfe" ...
## ..- attr(*, "class")= chr "communities"
## $ cluster_res:'data.frame': 50 obs. of 3 variables:
## ..$ vertex : Factor w/ 50 levels "acad manage rev",..: 45 43 41 24 37 38 10 16 14 48 ...
## ..$ cluster : num [1:50] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ btw_centrality: num [1:50] 0.1684 0.0178 0.0239 0.0806 1.2645 ...
## $ layout : num [1:50, 1:2] -0.265 -0.3487 -0.9116 -0.0946 0.0219 ...
net$graph
## IGRAPH b483ec8 UN-- 50 16723 --
## + attr: name (v/c), deg (v/n), size (v/n), label.cex (v/n), color (v/c), community (v/n), color (e/c), num
## | (e/n), width (e/n)
## + edges from b483ec8 (vertex names):
## [1] technol anal strateg--technical change ec technol anal strateg--technical change ec
## [3] technol anal strateg--technical change ec technol anal strateg--technical change ec
## [5] technol anal strateg--technical change ec technol anal strateg--technical change ec
## [7] technol anal strateg--technical change ec technol anal strateg--technical change ec
## [9] technol anal strateg--technical change ec technol anal strateg--systems innovation t
## [11] technol anal strateg--systems innovation t technol anal strateg--systems innovation t
## [13] technol anal strateg--systems innovation t technol anal strateg--systems innovation t
## + ... omitted several edges
Some summary statistics. I will only provide them here, but theur are availabel for all object created with biblioNetwork()
netstat <- networkStat(NetMatrix)
summary(netstat, k = 10)
##
##
## Main statistics about the network
##
## Size 7556
## Density 0.011
## Transitivity 0.175
## Diameter 6
## Degree Centralization 0.808
## Average path length 2.155
##
Historiograph - Direct citation linkages
We can also look at a histograph of ciation pattern over time.
histResults <- histNetwork(M,
min.citations = quantile(M$TC,0.75, na.rm = TRUE),
sep = ";")
## Articles analysed 100
## Articles analysed 127
net <- histPlot(histResults,
n = 20,
size.cex=TRUE,
size = 5,
labelsize = 3,
arrowsize = 0.5)

##
## Legend
##
## Paper DOI Year LCS GCS
## 1992 - 1 COOKE P, 1992, GEOFORUM 10.1016/0016-7185(92)90048-9 1992 12 343
## 1997 - 7 COOKE P, 1997, RES POLICY 10.1016/S0048-7333(97)00025-5 1997 41 836
## 1998 - 9 MEYER-KRAHMER F, 1998, RES POLICY 10.1016/S0048-7333(98)00094-8 1998 6 390
## 1999 - 15 EDQUIST C, 1999, TECHNOL SOC 10.1016/S0160-791X(98)00037-2 1999 7 127
## 2001 - 24 LIU XL, 2001, RES POLICY 10.1016/S0048-7333(00)00132-3 2001 14 292
## 2001 - 25 KAUFMANN A, 2001, RES POLICY 10.1016/S0048-7333(00)00118-9 2001 7 182
## 2002 - 31 MALERBA F, 2002, RES POLICY 10.1016/S0048-7333(01)00139-1 2002 28 840
## 2002 - 32 LUNDVALL BA, 2002, RES POLICY 10.1016/S0048-7333(01)00137-8 2002 19 488
## 2002 - 34 FREEMAN C, 2002, RES POLICY 10.1016/S0048-7333(01)00136-6 2002 10 219
## 2003 - 43 MOULAERT F, 2003, REG STUD 10.1080/0034340032000065442 2003 9 443
## 2004 - 49 GEELS FW, 2004, RES POLICY 10.1016/J.RESPOL.2004.01.015 2004 13 950
## 2005 - 57 ASHEIM BT, 2005, RES POLICY 10.1016/J.RESPOL.2005.03.013 2005 15 554
## 2005 - 59 IAMMARINO S, 2005, EUR PLAN STUD 10.1080/09645310500107084 2005 6 122
## 2006 - 67 SHARIF N, 2006, RES POLICY 10.1016/J.RESPOL.2006.04.001 2006 7 165
## 2006 - 70 LEYDESDORFF L, 2006, RES POLICY 10.1016/J.RESPOL.2006.09.027 2006 6 81
## 2008 - 81 BERGEK A, 2008, RES POLICY 10.1016/J.RESPOL.2007.12.003 2008 62 571
## 2008 - 86 KLERKX L, 2008, FOOD POLICY 10.1016/J.FOODPOL.2007.10.001 2008 6 126
## 2010 - 106 COENEN L, 2010, J CLEAN PROD 10.1016/J.JCLEPRO.2010.04.003 2010 8 95
## 2012 - 113 WEBER KM, 2012, RES POLICY 10.1016/J.RESPOL.2011.10.015 2012 8 177
## 2014 - 126 BINZ C, 2014, RES POLICY 10.1016/J.RESPOL.2013.07.002 2014 7 92
The conceptual structure and context - Co-Word Analysis
Co-word networks show the conceptual structure, that uncovers links between concepts through term co-occurences.
Conceptual structure is often used to understand the topics covered by scholars (so-called research front) and identify what are the most important and the most recent issues.
Dividing the whole timespan in different timeslices and comparing the conceptual structures is useful to analyze the evolution of topics over time.
Bibliometrix is able to analyze keywords, but also the terms in the articles’ titles and abstracts. It does it using network analysis or correspondance analysis (CA) or multiple correspondance analysis (MCA). CA and MCA visualise the conceptual structure in a two-dimensional plot.
We can even do way more fancy stuff with abstracts or full texts (and do so). However, I dont want to spoiler Romans sessions, so I will hold myself back here
Co-word Analysis through Keyword co-occurrences
Plot options:
- normalize = “association” (the vertex similarities are normalized using association strength)
- n = 50 (the function plots the main 50 cited references)
- type = “fruchterman” (the network layout is generated using the Fruchterman-Reingold Algorithm)
- size.cex = TRUE (the size of the vertices is proportional to their degree)
- size = 20 (the max size of the vertices)
- remove.multiple=FALSE (multiple edges are not removed)
- labelsize = 3 (defines the max size of vertex labels)
- label.cex = TRUE (The vertex label sizes are proportional to their degree)
- edgesize = 10 (The thickness of the edges is proportional to their strength. Edgesize defines the max value of the thickness)
- label.n = 30 (Labels are plotted only for the main 30 vertices)
- edges.min = 25 (plots only edges with a strength greater than or equal to 2)
- all other arguments assume the default values
NetMatrix <- biblioNetwork(M,
analysis = "co-occurrences",
network = "keywords",
sep = ";")
net <- networkPlot(NetMatrix,
normalize = "association",
n = 50,
Title = "Keyword Co-occurrences",
type = "fruchterman",
size.cex = TRUE, size = 20, remove.multiple = FALSE,
edgesize = 10,
labelsize = 3,
label.cex = TRUE,
label.n = 50,
edges.min = 2)

Thematic Map
Co-word analysis draws clusters of keywords. They are considered as themes, whose density and centrality can be used in classifying themes and mapping in a two-dimensional diagram.
Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed: (1) upper-right quadrant: motor-themes; (2) lower-right quadrant: basic themes; (3) lower-left quadrant: emerging or disappearing themes; (4) upper-left quadrant: very specialized/niche themes.
Please see Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146-166.
NetMatrix <- biblioNetwork(M,
analysis = "co-occurrences",
network = "keywords",
sep = ";")
S <- normalizeSimilarity(NetMatrix,
type = "association")
net <- networkPlot(S,
n = 500,
Title = "Keyword co-occurrences",
type = "fruchterman",
labelsize = 2,
halo = FALSE,
cluster = "walktrap",
remove.isolates = FALSE,
remove.multiple = FALSE,
noloops = TRUE,
weighted = TRUE,
label.cex = TRUE,
edgesize = 5,
size = 1,
edges.min = 2)

Map <- thematicMap(M,
minfreq =5 )
plot(Map$map)

Lets inspect the clusters we found:
clusters <-Map$words %>%
arrange(Cluster, desc(Occurrences))
clusters %>%
select(Cluster, Words, Occurrences) %>%
group_by(Cluster) %>%
mutate(n.rel = Occurrences / sum(Occurrences) ) %>%
slice(1:3)
The social structure - Collaboration Analysis
Collaboration networks show how authors, institutions (e.g. universities or departments) and countries relate to others in a specific field of research. For example, the first figure below is a co-author network. It discovers regular study groups, hidden groups of scholars, and pivotal authors. The second figure is called “Edu collaboration network” and uncovers relevant institutions in a specific research field and their relations.
Author collaboration network
NetMatrix <- biblioNetwork(M %>% filter(!grepl("GESCHWIND", AU)),
analysis = "collaboration",
network = "authors",
sep = ";")
S <- normalizeSimilarity(NetMatrix, type = "jaccard")
net <- networkPlot(S,
n = 50,
Title = "Author collaboration",
type = "auto",
size = 10,
weighted = TRUE,
remove.isolates = TRUE,
size.cex = TRUE,
edgesize = 1,
labelsize = 0.6)

Edu collaboration network
NetMatrix <- biblioNetwork(M,
analysis = "collaboration",
network = "universities",
sep = ";")
net <- networkPlot(NetMatrix,
n = 50,
Title = "Edu collaboration",
type = "auto",
size = 10,
size.cex = T,
edgesize = 3,
labelsize = 0.6)

Country collaboration network
M <- metaTagExtraction(M,
Field = "AU_CO",
sep = ";")
NetMatrix <- biblioNetwork(M,
analysis = "collaboration",
network = "countries",
sep = ";")
net <- networkPlot(NetMatrix,
n = dim(NetMatrix)[1],
Title = "Country collaboration",
type = "sphere",
cluster = "lovain",
weighted = TRUE,
size = 10,
size.cex = T,
edgesize = 1,
labelsize = 0.6)
##
## Unknown cluster argument. Using default algorithm

Isn’t that all a lot of fun?
By now you should have realized that different leevel of projection and aggregation offer almost endless possibilities for analysis of ibliographic data!
By the way: We can also do all of that with tidygraph and ggraph
g <- NetMatrix %>% as.matrix() %>% as_tbl_graph(directed = FALSE)
g
## # A tbl_graph: 47 nodes and 174 edges
## #
## # An undirected multigraph with 5 components
## #
## # Node Data: 47 x 1 (active)
## name
## <chr>
## 1 NETHERLANDS
## 2 ITALY
## 3 SPAIN
## 4 UNITED KINGDOM
## 5 GERMANY
## 6 SWEDEN
## # ... with 41 more rows
## #
## # Edge Data: 174 x 3
## from to weight
## <int> <int> <dbl>
## 1 1 1 80
## 2 1 2 1
## 3 1 4 9
## # ... with 171 more rows
g <- g %N>%
mutate(community = as.factor(group_louvain(weights = weight)))
g %N>%
mutate(dgr = centrality_degree(weights = weight)) %>%
arrange(desc(dgr)) %>%
slice(1:200) %>%
ggraph(layout = 'fr') +
geom_edge_link(aes(width = weight), alpha = 0.2, colour = "grey") +
geom_node_point(aes(colour = community, size = dgr)) +
geom_node_text(aes(label = name), size = 1, repel = FALSE) +
theme_graph()
